Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Cross-project defect prediction method based on feature selection and TrAdaBoost

Li LI, Kexin SHI, Zhenkang REN

Journal of Computer Applications 2022, 42 (5): 1554-1562. DOI: 10.11772/j.issn.1001-9081.2021050867

Abstract （375）

HTML （12）

PDF （2257KB）（92）

Save

Cross-project software defect prediction can solve the problem of few training data in prediction projects. However， the source project and the target project usually have the large distribution difference， which reduces the prediction performance. In order to solve the problem， a new Cross-Project Defect Prediction method based on Feature Selection and TrAdaBoost （CPDP-FSTr） was proposed. Firstly， in the feature selection stage， Kernel Principal Component Analysis （KPCA） was used to delete redundant data in the source project. Then， according to the attribute feature distribution of the source project and the target project， the candidate source project data closest to the target project distribution were selected according to the distance. Finally， in the instance transfer stage， the TrAdaBoost method improved by the evaluation factor was used to find out the instances in the source project which were similar to the distribution of a few labeled instances in the target project， and establish a defect prediction model. Using F1 as the evaluation index， compared with the methods such as cross-project software defect prediction using Feature Clustering and TrAdaBoost （FeCTrA）， Cross-project software defect prediction based on Multiple Kernel Ensemble Learning （CMKEL）， the proposed CPDP-FSTr had the prediction performance improved by 5.84% and 105.42% respectively on AEEEM dataset， enhanced by 5.25% and 85.97% respectively on NASA dataset， and its two-process feature selection is better than the single feature selection process. Experimental results show that the proposed CPDP-FSTr can achieve better prediction performance when the source project feature selection proportion and the target project labeled instance proportion are 60% and 20% respectively.

Table and Figures | Reference | Related Articles | Metrics